Information foraging with an oracle

During ecological decisions, such as when foraging for food or selecting a weekend activity, we often have to balance the costs and benefits of exploiting known options versus exploring novel ones. Here, we ask how individuals address such cost-benefit tradeoffs during tasks in which we can either explore by ourselves or seek external advice from an oracle (e.g., a domain expert or recommendation system). To answer this question, we designed two studies in which participants chose between inquiring (at a cost) for expert advice from an oracle, or to search for options without guidance, under manipulations affecting the optimal choice. We found that participants showed a greater propensity to seek expert advice when it was instrumental to increase payoff (study A), and when it reduced choice uncertainty, above and beyond payoff maximization (study B). This latter result was especially apparent in participants with greater trait-level intolerance of uncertainty. Taken together, these results suggest that we seek expert advice for both economic goals (i.e., payoff maximization) and epistemic goals (i.e., uncertainty minimization) and that our decisions to ask or not ask for advice are sensitive to cost-benefit tradeoffs.


The best non-adaptive strategy
We now show that -given any distribution of gem values -at least one of the NC and the OF strategies is optimal (in terms of its expected gain) in the set of non-adaptive strategies.
Theorem 1.In a generic game, at least one of the NC and OF strategies is optimal in the set of non-adaptive strategies.
Proof.First of all, we prove that the exists an optimal non-adaptive strategy that is deterministic (i.e., non-randomized).Indeed, take an optimal non-adaptive strategy σ.Since σ could be random, a run of σ will either avoids calling the oracle, or calls it in the first round, or in the second round, . . ., or in round t.Let p ∞ be the probability that σ avoids calling the oracle and, for i = 1, . . ., t, let p i be the probability that σ calls the oracle in round i.Then, if E i is the expected gain conditioned on σ calling the oracle in round i, and if E ∞ is the expected gain conditioned on σ not calling the oracle, we have that the expected gain of σ is Observe that there might be multiple i ⋆ 's with this property.Then, the strategy that deterministically calls the oracle in round i ⋆ -or, if i ⋆ = ∞, the NC strategy -has an expected gain of E i ⋆ which, by E i ⋆ ≥ E i for each i, and by p 1 , . . ., p t , p ∞ being a probability distribution, entails that E i ⋆ ≥ E σ .Thus, we have created a deterministic non-adaptive strategy that is no worse than the chosen (optimal) randomized non-adaptive strategy.
We then assume that there exists an optimal non-adaptive strategy in the set S = {σ 1 , . . ., σ t , σ ∞ } -i.e., the set that contains, for i ∈ {1, . . ., t}, the deterministic strategy that calls the oracle in round i (σ i ), as well as the NC strategy (σ ∞ ).Observe that σ 1 is the OF strategy.
Recall that, for i ∈ [n], we let E i be the expected gain of the σ i strategy, and we let E ∞ be the expected gain of σ ∞ .(Then, E OF = E 1 and E N C = E ∞ ).We now show that November 20, 2023 1/5 max(E 1 , E ∞ ) ≥ E i for each i ∈ {1, . . ., t, ∞} -this will prove our main assertion.We consider two cases: • first, consider the strategy σ i , for i ∈ {2, 3 . . ., t − c}.This strategy, in its first i − 1 rounds, picks a uniform-at-random subset A of gems conditioned on |A| = i − 1, and, in its last t − c − i + 1 rounds picks t − c − i + 1 gems of maximum value (among the remaining ones).In particular, then, strategy σ i picks a subset of t − c gems; given that strategy σ 1 picks a subset of t − c gems of maximum total value out of the full set of gems, it must be that E i ≤ E 1 .It follows that, for each i ∈ {2, 3, . . ., t − c}, σ i is no better than σ 1 .
• Now, consider the strategy σ i , for i ∈ {t − c + 1, t − c + 2, . . ., t}.The strategy σ i is unable to collect any gem after its call to the oracle, since the oracle requires c time units, and only t time units are available.Thus, σ i will only collect i − 1 uniform-at-random gems.Recall that the strategy σ ∞ collects t uniform-at-random gems.Since i − 1 ≤ t − 1 < t, and since no gem has negative value, it must hold that If gems of negative value are present, this property might fail to hold -e.g., if each gem has value −1, σ ∞ has an expected gain of −t whereas each other σ i has an expected gain of at least −t + 1 > −t.
It follows that (at least) one of σ 1 and σ ∞ (that is, of OF and NC) is an optimal strategy of S and, as a consequence, it is an optimal non-adaptive strategy.
Observe that the expected gain of the NC strategy is equal to t times the average gem value, And, the expected gain of the OF strategy, E OF = E 1 , is equal to the total value of the t − c gems of highest value.
These two observations, together with Theorem 1, make it possible to efficiently compute the expected gain of the optimal non-adaptive strategy: an algorithm could just return the largest of E 1 and E ∞ .

The best adaptive strategy
We now give an algorithm, based on the well-known dynamic programming technique, for computing an optimal adaptive strategy.
Let S be a multiset of gem values; imagine S to be the multiset of values of the gems that are still on the map, after some number of moves.If, at some point in the past, the participant has called the oracle, the optimal move for S is easy to compute: the participant should collect one of the gems of highest value among those in S. Otherwise, the participant should decide whether to call the oracle in the next move, or whether to collect a uniform-at-random gem.
Let us define O t,c,S to be the maximum expected gain that an adaptive strategy can achieve (i) with t time units available, (ii) with an oracle cost of c, (iii) on a map containing gems whose multiset of values is S.
Clearly if t ≤ 0, or if S = ∅, then O t,c,S = 0: if there are no moves, or no gems, available, then the participant cannot collect any gem, and ends up with a null total gain.Otherwise, we have that where T t−c (S) is the total value of the t − c gems in S of highest value, or the total value of the gems in S if |S| < t − c (and, T k (S) = 0 if k ≤ 0).
The recurrence chooses the best of the two options available to the participant: either collect a uniform-at-random gem (and obtain the value v of that gem plus the optimal gain that can be achieved with t − 1 units of time on a map with gem values November 20, 2023 2/5 S − {v}), or call the oracle (and obtain the total value T t−c (S) of the t − c gems of highest value in S).Clearly, the optimal strategy would select an option that results in the largest expected value.Thus, we have expressed the value of O t,c,S for the base cases (S = ∅, or t ≤ 0), and given a recurrence for the other cases.Unwinding the recurrence gives rise to a dynamic program that computes the maximum expected gain achievable by an adaptive strategy.
It is easy to observe that, for a set of n gems, this dynamic program runs in time O(n • 2 n ).In fact, if there are only k distinct gem values v 1 , . . ., v k , and if the generic value v i is shared by g i gems, we have that n = k i=1 g i and that the dynamic program can also be implemented to run in time O(k • n k ) -that is, if there are constantly many distinct gem values, the runtime becomes polynomial.

The Adaptivity Gap
We have already argued that there exist settings where non-adaptivity makes the participant achieve no more than 96% of the optimal gain achievable by adaptive strategies.In this section we show that the gap can be larger: there are settings where non-adaptive users can collect no more than 91.73% of the optimal gain.Let ϕ = 1+ √ 5 2 ≈ 1.61803 . . .be the Golden ratio, and recall that 1 ϕ = ϕ − 1.We define a Golden ratio-based game which will allow us to give a stronger bound on the adaptivity gap: Definition 2 (Golden Game).For a large enough n, let the map be composed of g 1 = n − 1 gems of value v 1 = 1, and g 2 = 1 gem of value v 2 = n ϕ ; the player has t = n ϕ time units available, and calling the oracle requires c = In particular, observe that v 2 and t are approximately equal to 0.61803 . . .• n, and that c ≈ 0.23606 . . .• n.
We begin our analysis by computing the expected gain of the two extremal non-adaptive strategies on this game.Lemma 3.Both the NC, and the OF, strategies induce an expected gain of n ± O(1) on the game of Definition 2.
Proof.The NC strategy captures the gem of value v 2 with probability t n , and captures either t − 1, or t, gems of value v 1 : its expected gain is then The OF strategy, instead, captures the gem of value v 2 and t − c − 1 gems of value v 1 , with probability 1.Its total gain is then We can now bound the expected gain of optimal non-adaptive strategies.
November 20, 2023 3/5 Corollary 4. The expected gain of optimal non-adaptive strategies for the game of Definition 2 is n ± O(1).
We now show that there exists an adaptive strategy with an expected gain of 1.09016 . . .• n -roughly, 9% more than the expected gain of optimal non-adaptive strategies.
Theorem 5.There exists an adaptive strategy for the game of Definition 2 with expected gain (5 Proof.Consider the following adaptive strategy: in the first phase of the game, the participant captures t − c − 1 uniform-at-random gems.If the participant captures the high-value gem (i.e., that of value v 2 ) during the first phase, the participant will keep capturing uniform-at-random gems for the remainder of the game.Otherwise, right after the end of the first phase, the participant will (i) call the oracle, and (ii) capture the high-value gem, just before the game ends.
Observe that the behavior of the participant is determined by the event ξ = "the high-value gem is captured during the first phase".The probability of ξ is equal to t−c−1 n (i.e., the probability of drawing a red ball in t − c − 1 draws without replacement from an urn containing one red ball, and n − 1 blue balls).If ξ happens, the participant captures the gem of value v 2 , and t − 1 gems of value v 1 ; if ξ does not happen, the participant captures the gem of value v 2 , and t − c − 1 gems of value v 1 .Thus, the expected gain of the participant is equal to The final result of this section readily follows from the above bounds.
Corollary 6.There exist games with n gems such that non-adaptive strategies can get no more than a fraction of 2 5 3/2 − 9 + O 1 n ≈ 0.91728 . . . of the optimal (adaptive) strategy's expected gain.